Website Identification DEA Internship Report
نویسندگان
چکیده
I present in this paper a method to discover the set of webpages contained in a logical website, based on the link structure of the Web graph. Such a method is useful to identify the boundaries of what to crawl, in the context of Web archiving. For this purpose, I combine the use of an online version of the preflow-push algorithm, an algorithm for the maximum flow problem in traffic networks, and of the Markov CLuster (MCL) algorithm. The latter is used on a crawled portion of the Web graph in order to build a seed of initial webpages, a seed which is extended by the former. Experiments on subsites of the INRIA Website, which give satisfactory results, are described.
منابع مشابه
English - Journalism - Communications-Publishing - Advertising
English Journalism Communications-Publishing Advertising Career/Job Resources BookJobs.com [1] -Bookjobs.com is an online job/internship board with publishing opportunities throughout the US Corporation For Public Broadcasting [2] -This is a nationwide job/internship website with postings for positions in the public media sector. Council for PR Firms [3] – Career Center – This website has natio...
متن کاملInternship Report: Metaobject Protocols For Distributed Programming
This DEA internship report proposes a study and a classi cation of best known Metaobject Protocols (MOPs). Far from being totally exhaustive, it explains the nowadays motivation and use of MOPs by practical examples in many application areas. These examples naturally lead to distinguish di erent kinds of use of MOPs techniques and also, as one can expect, di erent kinds of implementation. These...
متن کاملTrajectories of depressive symptoms in response to prolonged stress in medical interns.
OBJECTIVE The high degree of heterogeneity in the development of depression under stress is unaccounted for in traditional statistical modeling. We employ growth mixture modeling to identify classes of individuals at highest risk of depression under stress. METHOD Medical internship was used as a prospective stress model. Interns from US residency programs completed demographic, psychological...
متن کاملIdentifying E-commerce Website Design Inefficiencies: a Business Value-driven Approach Using Dea
Managers at e-commerce firms are in need of proven methods for website evaluation. So, one of the most pressing issues is whether the design of their online storefronts is effective, and if not, which areas require attention and improvements. However, current approaches (e.g., user testing, inspection, inquiry) are not well suited to the task at hand. This paper proposes a new business value-dr...
متن کاملProgram Disparities in Unmatched Internship Applicants
Predoctoral internship represents an important capstone in the training of clinical and counseling psychologists. However, in the past decade there has been growing concern over the number of applicants to internship who have not been matched to an internship site. We investigated the scope of the internship match problem by assessing program-level contributions to the number of unmatched inter...
متن کامل